The data set was provided by QUT
In this part we identify the day of the week and sum it, then we can find the highest and lowest day in a week.The data include the Uber pick up record since may, 2017 to septemper, 2017 in New york. First, we pick one of the months as example. We add a column of day in the data set, then we use ggplot to display the count by day. It can be seen clearly that Thursay is the highest day and Monday is the lowest day.
library(ggplot2)
uber2<- read.csv("/Users/shung47/Documents/semester1 2017/Data manipulation/uber data/uber-raw-data-may14.csv")
uber<-uber2
uber$day<-c(weekdays(as.Date(uber$Date.Time,format="%m/%d/%y")))
fac<-factor(uber$day, levels=c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday"))
fac_table<-table(fac)
uber_t<-data.frame(fac_table)
names(uber_t)=c("day","count")
uber_t
## day count
## 1 Monday 51251
## 2 Tuesday 60861
## 3 Wednesday 91185
## 4 Thursday 108631
## 5 Friday 85067
## 6 Saturday 90303
## 7 Sunday 77218
ggplot(uber_t,aes(x=day,y=count,group=1))+geom_point(alpha=0.5)+geom_line()
To find out the trend, we have to sum the data by month. First, we import other months and use “nrow” to count the data in a month, then combining in the same file. Finally, use ggplot to show the result. The result shows that the amount of uber was increasing.
uber3<- read.csv("/Users/shung47/Documents/semester1 2017/Data manipulation/uber data/uber-raw-data-jun14.csv")
uber4<- read.csv("/Users/shung47/Documents/semester1 2017/Data manipulation/uber data/uber-raw-data-jul14.csv")
uber5<- read.csv("/Users/shung47/Documents/semester1 2017/Data manipulation/uber data/uber-raw-data-aug14.csv")
uber6<- read.csv("/Users/shung47/Documents/semester1 2017/Data manipulation/uber data/uber-raw-data-sep14.csv")
May<-nrow(uber2)
Jun<-nrow(uber3)
Jul<-nrow(uber4)
Aug<-nrow(uber5)
Sep<-nrow(uber6)
Total<-rbind(May,Jun,Jul,Aug,Sep)
#mon<-c("May","Jun","Jul","Aug","Sep")
#Total2<-cbind(mon,Total)
dt<-data.frame(Total)
#names(dt)=c("Month","Count")
dt
## Total
## May 564516
## Jun 663844
## Jul 796121
## Aug 829275
## Sep 1028136
#ggplot(dt,aes(x=Month,y=Count,fill="red"))+geom_bar(stat="identity")+labs(x="Month",y="Count")
ggplot(dt,aes(x=factor(Total),y=Total))+geom_bar(stat="identity")+labs(x="Apr,May,Jun,Jul,Aug,Sep",y="count")
First, remove the date, minute, second and use factor to identify the time, then use ggplot to show the bar plot. The results indicates that the lowest pick is 2-3am, then it increase undil 7 am. After a slightly fall, it keep increase until 17-18pm, which is the pick of calling ube.Then it keep droping until the midnight.
uber_tt<-sub("[0-9]/[0-9]?[0-9]/2014 ","",uber$Date.Time)
uber_a<-sub(":[0-9][0-9]:00","",uber_tt)
uber$hour<-uber_a
#uber_time<-aggregate(x=list(amount=uber$hour), FUN=length, by=list(hour=uber$hour))
fac2<-factor(uber$hour,levels=c("0","1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23"))
#uber_time2<-uber_time[order(uber_time$hour),]
fac2_t<-table(fac2)
dt2<-data.frame(fac2_t)
names(dt2)=c("hour","count")
dt2
## hour count
## 1 0 11910
## 2 1 7769
## 3 2 4935
## 4 3 5040
## 5 4 6095
## 6 5 9476
## 7 6 18498
## 8 7 24924
## 9 8 22843
## 10 9 17939
## 11 10 17865
## 12 11 18774
## 13 12 19425
## 14 13 22603
## 15 14 27190
## 16 15 35324
## 17 16 42003
## 18 17 45475
## 19 18 43003
## 20 19 38923
## 21 20 36244
## 22 21 36964
## 23 22 30645
## 24 23 20649
ggplot(dt2, aes(x=hour,y=count,group=1))+geom_line()+geom_point()+labs(x="time")
Import the NY map by“get_map”, then import the data to the map, which is divided into the day of a week by different map. According to the result, Manhattan has a high pick up rate compare to the other area. Howevert, the difference between days are not that obvious as the line graph in the first part.
library(ggmap)
map <- get_map(location = 'New york', zoom = 12)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=New+york&zoom=12&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=New%20york&sensor=false
NYmap<-ggmap(map)
NYmap
NYmap+stat_density2d(data=uber,aes(x=Lon, y=Lat, fill = ..level.., alpha=..level..),geom="polygon",size=2,bins=10)+scale_fill_gradient("Density")+scale_alpha(range = c(.4, .75), guide = FALSE)+guides(fill = guide_colorbar(barwidth = 1.5, barheight = 10))+facet_wrap(~day)
## Warning: Removed 48058 rows containing non-finite values (stat_density2d).
This part is similar to the perivious one. There is no obvious difference on the map between the different months.
map <- get_map(location = 'New york', zoom = 12)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=New+york&zoom=12&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=New%20york&sensor=false
NYmap<-ggmap(map)
uber2$month<-"May"
uber3$month<-"Jun"
uber4$month<-"Jul"
uber5$month<-"Aug"
uber6$month<-"Sep"
uberA<- rbind(uber2,uber3,uber4,uber5,uber6)
NYmap+stat_density2d(data=uberA,aes(x=Lon, y=Lat, fill = ..level.., alpha=..level..),geom="polygon",size=2,bins=10)+scale_fill_gradient("Density")+scale_alpha(range = c(.4, .75), guide = FALSE)+guides(fill = guide_colorbar(barwidth = 1.5, barheight = 10))+facet_wrap(~month)
## Warning: Removed 434717 rows containing non-finite values (stat_density2d).
We can get valuable information from this part. We can see exactly where and when that people call Uber most frequently in a day. Uber drivers can follow the pattern to increase the chance of getting passengers.
map <- get_map(location = 'New york', zoom = 12)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=New+york&zoom=12&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=New%20york&sensor=false
NYmap<-ggmap(map)
NYmap+stat_density2d(data=uber,aes(x=Lon, y=Lat, fill = ..level.., alpha=..level..),geom="polygon",size=2,bins=10)+scale_fill_gradient("Density")+scale_alpha(range = c(.4, .75), guide = FALSE)+guides(fill = guide_colorbar(barwidth = 1.5, barheight = 10))+facet_wrap(~hour)
## Warning: Removed 48058 rows containing non-finite values (stat_density2d).
According to the data, we know the when and where that people called Uber most. Furthermore, the trend of passengers during these months can be observed, which can be used to predict the passenger pattern in the future. Base on the prediction, Uber can allocate drivers to satisfy customers’ demand and maximize the profits.